NOTES.PDQ[LSP,JRA]4 - www.SailDart.org

perm filename NOTES.PDQ[LSP,JRA]4 blob sn#133790 filedate 1974-11-22 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00002 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	.SS(A project: Syntax-directed I/O,SDIO,P105:)
C00016 ENDMK
C⊗;
.SS(A project: Syntax-directed I/O,SDIO,P105:)
.BEGIN TURN ON "←";
It is frequently quite desirable and convenient to enter input and
receive output in  something other than sexprs.
Recall our diagram on {yon(P46)}.
%3eval%* demonstrated  that at least one style of evaluation can be mechanized.
What we wish to do now is examine the possibility of mechanizing the
encoding of the input and the decoding of the output.  

Consider for example, the problem of simplification of algebraic expressions.
Many rather sophisticated simplifiers have been written. Assume that we have
one named %3simplify%* which expects sexpr input and gives sexpr output. Thus
for example

.BEGIN SELECT 3;CENTERIT;
←(3+4)*x + x =%4I %*=> (PLUS (TIMES (PLUS 3 4) X) X) =%4simplify%*=> (TIMES 8 X) =%4O %*=> 8*x.
.END

We would like transformations I and O done automatically. Notice LISP is itself
a candidate for such a task.

.BEGIN SELECT 3;CENTERIT;
←cons[A;B] =%4I %*=> (CONS (QUOTE A)(QUOTE B)) =%4eval%*=> (A . B) =%4O %*=> (A . B)
.END

Transformation O is the identity in this case.

In many cases the automatic generation of the I and O transformations can be
done. Such a program, SDIO (Syntax#Directed#Input-Output),
was written by Lynn Quam of the Stanford AI Laboratories
in 1968. It is the forerunner of the more ambitious MLISP2 project.
The basic idea is simple enough. We will assume that the input and output
syntax is specified in BNF. With each BNF equation we will associate 
semantics describing the sexpr representation. The input transformation (parser) 
will
use this information to build the representation; and the output transformation
(unparser) will map the internal representation back.

The importance of syntax directed I/O  should not be minimized. One aspect of
SDIO is the ablility to define new data types in LISP. Assume we which to
represent a structure as a particularly horrible list structure. We can give
augmented BNF equations specifying the external representation and the translation
to the underlying representation. Clearly when outputing these structures 
we do not want to see the internal representation. This can be particularly
annoying when we are debugging; when we are trying to concentrate on a misbehaving
algorithm we do not want to be distracted by incomprehensible output.
Thus syntax directed output or unparsing is at least as important as input.
Notice too that SDIO is a simple way to approach abstract data structures.

Perhaps the easiest introduction to SDIO is to examine an example. 
Consider the  proposed
simplification task above. The "natural" input syntax can be described in BNF.
We have given closely related syntax equations
on {yon(P68)}. We  will display  a few equations augmented by SDIO semantics. 
For example:

.BEGIN TABIT2(17,55);

   <EXP>\::= <EXP> + <TERM>\=>(PLUS EXP TERM)
\::= <TERM>\=>*

   <TERM>\::= <NUMBER>\=>*

.END
To the input parser the first BNF equation means: whenever the right hand side 
is recognized, reduce that occurrence to the left hand side and associate with it
the list consisting of the atom PLUS, the sexpr associate with the occurrence of
<EXP>, and the sexpr associated with the occurrrence of <TERM>.
The second equation means reduce <TERM> to <EXP> associating whatever sexpr
is attached to <TERM> with that occurrence of <EXP>. Hπpthe third equation
we assume that <NUMBER> is a syntactic type recognized by the scanner, returning
that number as the semantic value.
For example if such a parser were given 2#+3+44 it should return the list
%3(PLUS 2 (PLUS 3 44))%*.

The unparser uses these equations in the opposite manner. It will see a sexpr
and will attempt to match that to the description of the semantics, outputting
an instance of the BNF if successful.

The SDIO program will take such an augmented set of BNF equations and
generate a parser and an unparser for the language.
This project involves writing such a SDIO program. We describe a basic
SDIO program and suggest extensions and improvements.

The best way to describe the format of  SDIO input is to give an SDIO
description:

.BEGIN TABIT2(17,55);

<RULES>\::= END\=>NIL
\::=<RULE><RULES>\=>(RULE . RULES)

<RULE>\::= <LFPT><RTLST>\=>(LFPT RTLST)

<RTLST>\::= ::=<RTPT><SEXPR><RTLST>\=>((RTPT SEXPR) . RTLST)
\::=\=>NIL

<LFPT>\::= <<ID>>\=>*

<RTPT>\::= "=>\=>NIL
\::= <RPELEM><RTPT>\=>(RPELEM . RTPT)

<RPELEM>\::= <<ID>>\=>*
\::=<ID>\=>(SPWD ID)
\::=""<CHAR>\=>(QCH CHAR)
\::=<CHAR>\=>(CH CHAR)

<SEXPR>\::= <ATOM>\=>*
\::= (<SEXPRLIST>)\=>*

<SEXPRLIST>\::= <ATOM>	\=>*
\::= <SEXPR> <SEXPRLIST>\=>(SEXPR . SEXPRLIST)
\::=\=>NIL

END
.END

So the SDIO input is a sequence of augmented BNF equations terminated with END.
What SDIO program sees is a sexpr representation of these equations. 
The sample equations for <EXP> above would pass the following to the SDIO program.

.BEGIN TABIT3(10,13,25);

\(\(EXP(\((EXP (CH +) TERM) (PLUS EXP TERM)) 
\\\((TERM) NIL)))
\\(TERM(\((NUMBER) NIL)))))
.END
What the SDIO program outputs are the parser and unparser.

The elements of the BNF equations in SDIO are rather standard: syntactic variables,
identifiers bracketed by "<" and ">"; special words, which are identifiers; and
special characters, which are preceeded by  " if they conflict with the special
characters of the BNF.

The elements of the semantics are: unbracketed syntactic variables occurring
in the RHS of the associated BNF equation; other identifiers, taken as constants;
NIL, the LISP atom; notation for %3cons%*-ing, %3(###.###)%*; notation for
making a list, %3(e%41%*###e%4n%*)%1; the character *, described above.

.CENT(Project)

Write such a SDIO program. You should consult local documentation and the
LISP wizard when building the basic I/O routines like <NUMBER>, <CHAR> or <ID>.

.CENT(First Extension)

You should have noticed already that the basic SDIO program fails to
distinguish two occurrences of the same syntactic variable on the RHS of an
equation. Thus an equation like:

.BEGIN TABIT1(8);

<ZIP>\::= <ZAP> <FOO> <ZAP>  must be replaced by the pair:

<ZIP>\::= <ZAP> <FOO> <ZAP1>
<ZAP1>\::=<ZAP>
.END
This trick is called ⊗→stratification↔←. It is a syntactic trick, adding
nothing to the semantics.


Add notation to the semantics of your SDIO program to handle RHS with 
multiple occurences of syntactic variables. Modify your parser generators
accordingly.

.CENT(Second Extension)

Besides building a sexpr representation of the input, it is frequently
desirable to generate other information during the input parse. Lists of
occurences of operators or other tables are commonly needed. The additional
information could be discovered by examination of the completed parse tree
but that requires reexamination of the tree. It is much more efficient to
do as much as possible on a single pass.

Introduce notation which will allow execution of arbitrary LISP code
as the parse is in progress. That code should be able to manipulate
any of the semantic properties associated with the syntactic variables appearing
in the RHS of the associated  syntax equation.

.CENT(Third extension)

While it is obviously advantageous to produce output in the language
described by the BNF equations rather than the sexpr form, formatting
of the output can be equally beneficial.
We should like to be able to specify formatting information in SDIO
such that spacing and line-length are controlled.

One proposal is to embed spacing and line-feed control characters in the
BNF equations. The spacing character is "→" and the line-feed is "↓".
The "↑" sets the indentation point for the string on its right; and the "→"
followed by a digit says space over than number of spaces from the
last indentation point if the remaining  space on the line is not sufficient to
contain all text specified by the remaining RHS of the equation.
"→"0, meaning go to the indentation point can be written "→".

For example consider the following:
.BEGIN TABIT2(10,24);

\<EXPR>\::= <ID>( ↓ <EXPR_LIST>)
\\::= <ID>

\<EXPR_LIST>\::= ↓<EXPR> , →<EXPR_LIST>
\\::= ↓<EXPR>
.END

These equations, when used to drive an unparser could result in:

.BEGIN TABIT1(7);

mumf(\a,
\foobaz(garp(b),
\bletch(a,b,c),
\d)
.END

Extend SDIO to handle formatting of output.
.END